Querying Provenance for Ranking and Recommending

نویسندگان

  • Zachary G. Ives
  • Andreas Haeberlen
  • Tao Feng
  • Wolfgang Gatterbauer
چکیده

As has been frequently observed in the literature, there is a strong connection between a derived data item’s provenance and its authoritativeness, utility, relevance, or probability. A standard way of obtaining a score for a derived tuple is by first assigning scores to the “base” tuples from which it is derived — then using the semantics of the query and the score measure to derive a value for the tuple. This “provenance-enabled” scoring has led to a variety of scenarios where tuples’ intrinsic value is based on their provenance, independent of whatever other tuples exist in the data set. However, there is another class of applications, revolving around sharing and recommendation, in which our goal may be to rank tuples by their “importance” or the structure of their connectivity within the provenance graph. We argue that the most natural approach is to exploit the structure of a provenance graph to rank and recommend “interesting” or “relevant” items to users, based on global and/or local provenance graph structure and random walk-based algorithms. We further argue that it is desirable to have a high-level declarative language to extract portions of the provenance graph and then apply the random walk computations. We extend the ProQL provenance query language to support a wide array of random walk algorithms in a high-level way, and identify opportunities for query optimization. Disciplines Computer Sciences Comments Ives, Z., Haeberlen, A., Feng, T., & Gatterbauer, W., Querying Provenance for Ranking and Recommending, 4th USENIX Workshop on the Theory and Practice of Provenance (TaPP'12), June 2012, https://www.usenix.org/conference/tapp12/querying-provenance-ranking-and-recommending This conference paper is available at ScholarlyCommons: http://repository.upenn.edu/cis_papers/605 Querying Provenance for Ranking and Recommending Zachary G. Ives Andreas Haeberlen Tao Feng Wolfgang Gatterbauer Computer and Information Science Department Tepper School of Business University of Pennsylvania Carnegie Mellon University {zives,ahae,fengtao}@cis.upenn.edu [email protected]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Selective Provenance for Datalog Programs Using Top-K Queries

Highly expressive declarative languages, such as datalog, are now commonly used to model the operational logic of dataintensive applications. The typical complexity of such datalog programs, and the large volume of data that they process, call for result explanation. Results may be explained through the tracking and presentation of data provenance, and here we focus on a detailed form of proven...

متن کامل

Temporal Provenance Model (TPM): Model and Query Language

Provenance refers to the documentation of an object’s lifecycle. This documentation (often represented as a graph) should include all the information necessary to reproduce a certain piece of data or the process that led to it. In a dynamic world, as data changes, it is important to be able to get a piece of data as it was, and its provenance graph, at a certain point in time. Supporting time-a...

متن کامل

OPQL: Querying scientific workflow provenance at the graph level

Article history: Received 21 December 2011 Received in revised form 30 August 2013 Accepted 31 August 2013 Available online xxxx Provenance has become increasingly important in scientific workflows to understand, verify, and reproduce the result of scientific data analysis. Most existing systems store provenance data in provenance stores with proprietary provenance data models and conduct query...

متن کامل

Managing and using provenance in the semantic web

The Web contains some extremely valuable information; however, often poor quality, inaccurate, irrelevant or fraudulent information can also be found. With the increasing amount of data available, it is becoming more and more difficult to distinguish truth from speculation on the Web. One of the most, if not the most, important criterion used to evaluate data credibility is the information sour...

متن کامل

Approaches for Exploring and Querying Scientific Workflow Provenance Graphs

While many scientific workflow systems track and record data provenance, few tools have been developed that provide convenient and effective ways to access and explore this information. Two important ways for provenance information to be accessed and explored is through browsing (i.e., visualizing and navigating data and process dependencies) and querying (e.g., to select certain portions of pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012